mean field
0b0d29e5d5c8a7a25dced6405bd022a9-Supplemental.pdf
We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
0b0d29e5d5c8a7a25dced6405bd022a9-Paper.pdf
We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis
Zeng, Sihan, Evans, Benjamin Patrick, Bhatt, Sujay, Ardon, Leo, Ganesh, Sumitra, Koppel, Alec
We study policy optimization in Stackelberg mean field games (MFGs), a hierarchical framework for modeling the strategic interaction between a single leader and an infinitely large population of homogeneous followers. The objective can be formulated as a structured bi-level optimization problem, in which the leader needs to learn a policy maximizing its reward, anticipating the response of the followers. Existing methods for solving these (and related) problems often rely on restrictive independence assumptions between the leader's and followers' objectives, use samples inefficiently due to nested-loop algorithm structure, and lack finite-time convergence guarantees. To address these limitations, we propose AC-SMFG, a single-loop actor-critic algorithm that operates on continuously generated Markovian samples. The algorithm alternates between (semi-)gradient updates for the leader, a representative follower, and the mean field, and is simple to implement in practice. We establish the finite-time and finite-sample convergence of the algorithm to a stationary point of the Stackelberg objective. To our knowledge, this is the first Stackelberg MFG algorithm with non-asymptotic convergence guarantees. Our key assumption is a "gradient alignment" condition, which requires that the full policy gradient of the leader can be approximated by a partial component of it, relaxing the existing leader-follower independence assumption. Simulation results in a range of well-established economics environments demonstrate that AC-SMFG outperforms existing multi-agent and MFG learning baselines in policy quality and convergence speed.
Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues
Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
The application of Reinforcement Learning (RL) to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents. While canonical economic models assume atomistic agents act as `takers' of aggregate market conditions, a naive single-agent RL simulation incentivizes the agent to become a `manipulator' of its environment. This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy. Additionally, we identify a parametric bias arising from the mismatch between economic discounting and RL's treatment of intertemporal costs. To address both issues, we propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs. Our iterative algorithm converges to a self-consistent fixed point where the agent's policy aligns with the competitive equilibrium. This approach provides a tractable and theoretically sound methodology for modeling learning agents in economic systems within the broader domain of computational social science.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper introduces a full model for tracking while allowing for multiple and varying number of hypothesis and clutter. It promises a clear notation and fast algorithms through the use of variational/Baum-Welch type inference. Experiments appear extensive and are performed on real-world data. The key novelty of this paper is the assignment problem (aka data association). Tracking itself, as the authors acknowledge, is a well-trodden field.
Networked Communication for Decentralised Cooperative Agents in Mean-Field Control
Benjamin, Patrick, Abate, Alessandro
We introduce networked communication to mean-field control (MFC) - the cooperative counterpart to mean-field games (MFGs) - and in particular to the setting where decentralised agents learn online from a single, non-episodic run of the empirical system. We adapt recent algorithms for MFGs to this new setting, as well as contributing a novel sub-routine allowing networked agents to estimate the global average reward from their local neighbourhood. We show that the networked communication scheme allows agents to increase social welfare faster than under both the centralised and independent architectures, by computing a population of potential updates in parallel and then propagating the highest-performing ones through the population, via a method that can also be seen as tackling the credit-assignment problem. We prove this new result theoretically and provide experiments that support it across numerous games, as well as exploring the empirical finding that smaller communication radii can benefit convergence in a specific class of game while still outperforming agents learning entirely independently. We provide numerous ablation studies and additional experiments on numbers of communication round and robustness to communication failures.
Reviews: CRF-CNN: Modeling Structured Information in Human Pose Estimation
The general idea of this work is clearly in a direction of interest to the community and the results look strong. However, there are a few aspects to this work that I find quite unclear. If they were clearer this work would have much more potential for impact. In particular, it is not clear enough if this work uses a truly'end-to-end' approach (as stated for one of the contributions). For example on line 46 it is stated that: "We show step by step how approximations are made to use an end-to-end learning CNN for implementing such CRF model."
Learning in Mean Field Games: A Survey
Laurière, Mathieu, Perrin, Sarah, Pérolat, Julien, Girgin, Sertan, Muller, Paul, Élie, Romuald, Geist, Matthieu, Pietquin, Olivier
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malham\'e, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives.